University of Strathclyde
2025-11-24
We should always minimise suffering
This may mean not performing an experiment at all. Not all new knowledge or understanding is worth causing suffering to obtain it.
Where there is sufficient justification to perform an experiment, we are ethically obliged to minimise the amount of distress or suffering that is caused, by designing the experiment to achieve this.
Why we need statistics
It may be easy to tell whether an animal is well-treated, or whether an experiment is necessary.
But what is an acceptable (i.e. the least possible) amount of suffering necessary to obtain an informative result?
Quiz question
Suppose you are running a necessary and useful experiment with animal subjects, where the use of animals is morally justified. You are comparing a treatment group to a control group. Which of the following choices will cause the least amount of suffering?
The appropriate number of subjects
The appropriate number of animal subjects to use in an experiment is always the smallest number that - given reasonable assumptions - will satisfactorily give the correct result to the desired level of certainty.
By convention1 the usual level of certainty for a hypothesis test is: “we have an 80% chance of getting the correct true/false answer for the hypothesis being tested”
Experimental design and statistics are intertwined
Once a research hypothesis has been devised:
Design your experiment for…
“For scientific, ethical and economic reasons, experiments involving animals should be appropriately designed, correctly analysed and transparently reported. This increases the scientific validity of the results, and maximises the knowledge gained from each experiment. A minimum amount of relevant information must be included in scientific publications to ensure that the methods and results of a study can be reviewed, analysed and repeated. Omitting essential information can raise scientific and ethical concerns.” (Kilkenny et al. (2009))
We rely on the reporting of the experiment to know if it was appropriate
“Detailed information was collected from 271 publications, about the objective or hypothesis of the study, the number, sex, age and/or weight of animals used, and experimental and statistical methods. Only 59% of the studies stated the hypothesis or objective of the study and the number and characteristics of the animals used. […] Most of the papers surveyed did not use randomisation (87%) or blinding (86%), to reduce bias in animal selection and outcome assessment. Only 70% of the publications that used statistical methods described their methods and presented the results with a measure of error or variability.” (Kilkenny et al. (2009))
We cannot rely on the literature for good examples of experimental design
No publication explained their choice for the number of animals used
We cannot rely on the verbal authority of ‘published scientists’ or ‘experienced scientists’ for good experimental design
“Power analysis or other very simple calculations, which are widely used in human clinical trials and are often expected by regulatory authorities in some animal studies, can help to determine an appropriate number of animals to use in an experiment in order to detect a biologically important effect if there is one. This is a scientifically robust and efficient way of determining animal numbers and may ultimately help to prevent animals being used unnecessarily. Many of the studies that did report the number of animals used reported the numbers inconsistently between the methods and results sections. The reason for this is unclear, but this does pose a significant problem when analysing, interpreting and repeating the results.” (Kilkenny et al. (2009))
Important
As scientists, you - yourselves - need to understand the principles behind the statistical tests you use, in order to choose appropriate tests and methods, and to use appropriate measures to minimise animal suffering and obtain meaningful results.
You cannot simply rely on the word of “experienced scientists” for this.
The following year Kilkenny et al. (2010) proposed the ARRIVE guidelines: a checklist to help researchers report their animal research transparently and reproducibly.
Many journals now routinely request information in the ARRIVE framework, often as electronic supplementary information. The framework covers 20 items including the following (Kilkenny et al. (2010)):
ARRIVE guidelines (highlights)
Warning
“A key step in tackling these issues is to ensure that the next generation of scientists are aware of what makes for good practice in experimental design and animal research, and that they are not led into poor or inappropriate practices by more senior scientists without a proper grasp of these issues.”
Recommended reading
Bate and Clark (2014)
Your experimental measurements are random variables
Important
This does not mean that your measurements are entirely random numbers
Caution
Random variables are values whose range is subject to some element of chance, e.g. variation between individuals
The probability distribution of a random variable \(z\) (e.g. what you measure in an experiment) takes on some range of values1
The mean of the distribution of \(z\)
The variance of a distribution of \(z\)
A distribution where all values of \(z\) are the same
\[z = \mu_z \implies z - \mu_z = 0 \implies (z - \mu_z)^2 = 0\] \[\textrm{variance} = E((z - \mu_z)^2) = E(0^2) = 0\]
All other distributions
In every other distribution, there are some values of \(z\) that differ so, for at least some values of \(z\)
\[z \neq \mu_z \implies z - \mu_z \neq 0 \implies (z - \mu_z)^2 \gt 0 \] \[\implies \textrm{variance} = E((z - \mu_z)^2) \gt 0 \]
Standard deviation is the square root of the variance
\[\textrm{standard deviation} = \sigma_z = \sqrt{\textrm{variance}} = \sqrt{E((z - \mu_z)^2)} \]
Advantages
Note
We can calculate mean, variance, and standard deviation for any probability distribution.
\[ z \sim \textrm{normal}(\mu_z, \sigma_z) \]
Note
We only need to know the mean and standard deviation to define a unique normal distribution
Tip
Measurements of variables whose value is the sum of many small, independent, additive factors may follow a normal distribution
Important
There is no reason to expect that a random variable representing direct measurements in the world will be normally distributed!
Tip
Tip
Suppose you’re taking shots in basketball
Tip
This kind of process generates a random variable approximating a probability distribution called a binomial distribution.
It is different from a normal distribution.
\[ z \sim \textrm{binomial}(n, p) \]
Tip
\[z \sim \textrm{binomial}(20, 0.3) \]
mean and sd
\[ \textrm{mean} = n \times p \] \[ \textrm{sd} = \sqrt{n \times p \times (1-p)}\]
Design note
You need to design your experiments and analyses to reflect the appropriate process/probability distributions of your data. E.g., does \(p\) differ between two conditions?
In prior experiments the frequency of calcium events in WKY was 3.8 \(\pm\) 1.1 events/field/min compared to 18.9 \(\pm\) 7.1 in SHR
This is not normal (or binomial)
Something that happens a certain number of times in a fixed interval generates a Poisson distribution.
This is different from a normal or binomial distribution.
\[z \sim \textrm{poisson}(\lambda)\]
Poisson distribution
\[ \textrm{mean} = \lambda \] \[ \textrm{sd} = \sqrt{\lambda} \]
Expectation (\(\lambda\))
Only one parameter is provided, \(\lambda\): the rate with which the measured event happens
Suppose a county has population 100,000, and average rate of cancer is 45.2mn people each year
\[z \sim \textrm{poisson}(45,200,000/100,000) = \textrm{poisson}(4.52) \]
Design note
You need to design your experiments and analyses to reflect the appropriate process/probability distributions of your data
Some important features
Distributions are starting points
Warning
Probability mass
Parameters are unknown numbers that determine a statistical model
A linear regression
\[ y_i = a + b x_i \]
A normal distribution representing your data
\[ z \sim \textrm{normal}(\mu_z, \sigma) \]
An estimand (or quantity of interest) is a value that we are interested in estimating
A linear regression
\[ y_i = a + b x_i\]
These are all estimands, and estimates are represented using the “hat” symbol: \(\hat{a}\), \(\hat{b}\), etc.
A normal distribution representing your data
\[ z \sim \textrm{normal}(\mu_z, \sigma) \]
Note
Important
Tip
Warning
I, and many other statisticians, do not recommend this approach.
However, the concept is widespread and we need to discuss it
A common definition
Statistical significance is conventionally defined as a threshold (commonly, a \(p\)-value less than 0.05) relative to some null hypothesis or prespecified value that indicates no effect is present.
E.g., an estimate may be considered “statistically significant at \(P < 0.05\)” if it:
More generally, an estimate is “not statistically significant” if, e.g.
Most tests rely on probability distributions
The experiment
The hypotheses
The distribution
The null hypothesis
Observed between post-treatment levels: \(\bar{y}_T - \bar{y}_C = -1.4\)
We choose a significance threshold in advance
Compare the estimate to the threshold
We choose a significance threshold in advance
Compare the estimate to the threshold
What did not change
What changed
Significance threshold choice
Use two tails if direction of change doesn’t matter
Use one-tailed tests when direction matters
Use one-tailed tests when direction matters
Warning
It is a common error to summarise comparisons by statistical significance into “significant” and “non-significant” results
Statistical significance is not the same as practical importance
Warning
It is a common error to summarise comparisons by statistical significance into “significant” and “non-significant” results
Non-significance is not the same as zero
The difference between ‘significant’ and ‘not significant’ is not statistically significant
The difference between ‘significant’ and ‘not significant’ is not statistically significant
Important
We cannot make an infinite number of measurements of \(z\). We can only take a sample.
The mean and standard deviation we estimate in an experiment will not match those of the infinitely large population.
Standard Error (of the Mean)
The standard error of the mean reflects the uncertainty in our estimate of the mean.
When estimating the mean of an infinite population, given a simple random sample of size \(n\), the standard error is:
\[ \textrm{standard error} = \sqrt{\frac{\textrm{Variance}}{n}} = \frac{\textrm{standard deviation}}{\sqrt{n}} = \frac{\sigma}{\sqrt{n}} \]
Tip
Uncertainty in the mean estimate \(\mu\) reduces proportionally to the square root of the number of samples, \(n\)
Hypothesis test statistics
\[ t = \frac{Z}{s} = \frac{Z}{\sigma/\sqrt{n}} \]
This is true for many hypothesis test methods
One-sample \(t\)-test
\[ t = \frac{Z}{s} = \frac{\bar{X} - \mu}{\hat{\sigma}/{\sqrt{n}}} = \frac{\bar{X} - \mu}{s(\bar{X})} \]
Wald test
\[ \sqrt{W} = \frac{Z}{s} = \frac{\hat{\theta} - \theta_0}{s(\hat{\theta})} \]
Hypothesis test statistics
\[ t = \frac{Z}{s} = \frac{Z}{\sigma/\sqrt{n}} \]
What happens if we hold \(Z\) and \(\sigma\) constant and vary sample size?
We reject the null hypothesis when $t > \(t_\textrm{crit}\)
The difference (\(Z\)) we need to see to reject the null varies with sample size
MP968 Experimental Design Workshop